基于SKETCH ENGINE的汉语近义词辨析研究
نویسندگان
چکیده
Abstract This study aims to describe the similarities and differences in collocation behavior semantic preferences between two synonymous Chinese verbs, hulüe (忽略) hushi (忽视). The data derives from Web 2017 (zhTenTen11) Simplified corpus Sketch Engine. applies both quantitative qualitative methodologies. results of research show that words often collocate with abstract nouns, adverbs, adjectives prepositions (“zai在”, “gei给”, “cong从”, “yu与”, “dui对”, “bei被”, “ba把”, “jiang将”). Some them only “hulüe” or “hushi”, some but according frequency occurrence, tend co-occur one them. Referring Ucrel Semantic Analysis System, nouns classified "money commerce industry" domain "hulüe", "body individual" (especially "health diseases” “pharmacy health care” ), “education” "hushi".
 Keywords: sketch engine; synonym; discrimination; hulüe; hushi; behavior; preference
 
 摘要 本文通过Sketch Engine在线语料库里的Chinese Simplified对汉语近义动词“忽略”和“忽视”在搭配行为和语义倾向方面进行辨析。本研究采用定性定量分析法。 通过分析得出“忽略”和“忽视”常与抽象名词、副词、动词、形容词和介词(“在”、“给”、“从”、“与”、“对”、“被”、“把”、“将”)搭配。其中有些只与“忽略”或“忽视”搭配,还有些与两个词搭配,但是按照出现频率,有些偏向于与其中一个共现。参照Ucrel 语义分析系统,归为“工业的金钱和商业”领域的名词倾向于与“忽略”搭配,而归为“身体和个体”领域(尤其有关“健康与疾病”和“药品与医疗”),以及“教育”领域的名词倾向于与“忽视”搭配。
منابع مشابه
Tibetan Base Noun Phrase Identification Framework Based on Chinese-Tibetan Sentence Aligned Corpus
This paper presents an identification framework for extracting Tibetan base noun phrase (NP). The framework includes two phases. In the first phase, Chinese base NPs are extracted from all Chinese sentences in the sentence aligned Chinese-Tibetan corpus using Stanford Chinese parser. In the second phase, the Tibetan translations of those Chinese NPs are identified using four different methods, ...
متن کاملA Unified Framework for Discourse Argument Identification via Shallow Semantic Parsing
This paper deals with Discourse Argument Identification (DAI) from both intra-sentence and inter-sentence perspectives. For intra-sentence cases, we approach it via a simplified shallow semantic parsing framework, which recasts the discourse connective as the predicate and its scope into several constituents as the argument of the predicate. Different from state-of-the-art chunking approaches, ...
متن کاملReading news for information: How much vocabulary a CFL learner should know
This paper reports the findings of a corpus-based study on the vocabulary used in journalistic Chinese. Based on a 20-million character corpus of more than 27,000 news texts collected between mid 2003 and the end of 2004 from various Chinese media sources in different countries and regions, a character frequency list and three word and phrase frequency lists with two, three and fourcharacters w...
متن کامل现代汉语语义词典多义词词库的校正和再修订(New Editing and Checking Work of the Semantic Knowledge Base of Contemporary Chinese (SKCC))[In Chinese]
This paper is rooted in the two principles and methods that should be followed by sense discrimination for Chinese language processing: Completeness and discreteness. Built on the comparison of Semantic Knowledge-base of Contemporary Chinese (SKCC) and Grammatical Knowledge base of Contemporary Chinese (GKB), supported by large scale corpus, we conducted our new editing and checking works. Firs...
متن کاملStacking Heterogeneous Joint Models of Chinese POS Tagging and Dependency Parsing
Previous joint models of Chinese part-of-speech (POS) tagging and dependency parsing are extended from either graphor transition-based dependency models. Our analysis shows that the two models have different error distributions. In addition, integration of graphand transition-based dependency parsers by stacked learning (stacking) has achieved significant improvements. These motivate us to stud...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bambuti
سال: 2023
ISSN: ['2797-2232']
DOI: https://doi.org/10.53744/bambuti.v5i1.47